Domain-specific Stop Words in Malaysian Parliamentary Debates 1959 – 2018
نویسندگان
چکیده
Removal of stop words is essential in Natural Language Processing and text-related analysis. Existing works on Malay are based standard Quranic/Arabic translations into Malay. Thus, there a lack domain-specific word list, making it discordant for processing parliamentary discourse. In this paper, we propose semantic approach towards identifying removing Malay, conventional spelling English functional analysing time-series corpus, namely the Malaysian Hansard Corpus (MHC), to extract specific-domain list. The study utilised combination Z-method most frequently occurring words, that appear once, classic method. dataset corpus evaluated comprised Parliament 1 (year 1959) 13 2018). then categorised list according related words. resulting 587 New emerged from MHC include parliamentary-related like ‘ Berhormat’ (salutation members Parliament), Pertua’ Speaker House), ketawa’ (laugh) tepuk’ (clap). Other than typical ‘and’ ‘the’, also ‘hon’ble’ (short ‘Honourable’) ‘honourable’. includes untok ’ (for), lebeh (more), kapada (to). proposed set can be further assist natural language text
منابع مشابه
Exemelification of Parliamentary Debates
Parliamentary debates are an interesting domain to apply state-of-the-art information retrieval technology. Parliamentary debates are highly structured transcripts of meetings of politicians in parliament. These debates are an important part of the cultural heritage of countries; they are often free of copy-right; citizens often have a legal right to inspect them; and several countries make gre...
متن کاملModelling argumentation in parliamentary debates
In this paper we apply the information state update (ISU) machinery to tracking and understanding the argumentative behaviour of participants in a parliamentary debate in order to predict its outcome. We propose to use the ISU approach to model the arguments of the debaters and the support/attack links between them as part of the formal representations of a participant’s information state. We f...
متن کاملAdvanced Information Access to Parliamentary Debates
Parliamentary debates are highly structured transcripts of meetings of politicians in parliament. These debates are an important part of the cultural heritage of many countries; they are often free of copy-right; citizens often have a legal right to inspect them; and several countries make great effort to digitize their entire historical collection and make it available to the general public. T...
متن کاملBringing parliamentary debates to the Semantic Web
An analysis of parliamentary debates and media resources that cover them can provide insight into the political climate of a country. Although debates are now regularly published on official government portals, their analysis remains a cumbersome and challenging task for historians and political scientists. One of the main tasks of the PoliMedia project is to allow easy crossmedia comparisons a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: GEMA Online Journal of Language Studies
سال: 2021
ISSN: ['2550-2131', '1675-8021']
DOI: https://doi.org/10.17576/gema-2021-2102-01